Plotting software for MaP data

Patrick Irving, 5/19/2021

Vision:

To enable fast and easy exploration of MaP experimental data.

Possible Names

  • PlotMapper
  • MaP-ExPloRS (MaP data exploration and plotting on RNA Structures)
  • MaPplotlib (play on matplotlib, the python library for plotting)

Motivation

  • Weeks Lab GitHub has many highly specialized scripts.
    • plotting
    • filtering
    • file conversion
    • clipping structure cassettes
    • analysis
  • If you know what you want ahead of time, you can create a nice figure.
  • Data exploration is difficult because we have too many scripts and create too many files.

Solution: Jupyter Notebooks and plotmapper.py

Jupyter Notebooks come installed with Anaconda, and are accessible on Longleaf through OpenOnDemand.

plotmapper.py can be found in the JNBTools repo on Github.

Jupyter Notebooks

  • Really nice for anybody doing data analysis.
  • Makes your analysis easily reproducible.
  • Do everything in one place.
  • Text, code, and figures, all together.
  • Exports to PDF, HTML, and HTML slide shows for sharing.
  • This presentation is a Jupyter Notebook.

plotmapper.py

  • Makes it easy:
    • Filtering data.
    • Analyzing data.
    • Plotting data.

Filtering:

  • Fits data by sequence
    • Done automatically any time you want to compare data.
    • no more clipping/padding for structure cassettes
    • not limited to structure cassettes
  • Filter by any column in your data tables
    • Statistic, Z-score, Percentile, Deletion Rate, Read Depth, etc.
  • Filter by contact distances
  • Filter by 3-D distances

Plotting:

Plotmapper.py includes a variety of tools for plotting:

  • ShapeMapper QC data:
    • mutations per molecule
    • read length distribution
  • 1-D Reactivity data:
    • SHAPE-MaP
    • DANCE-MaP
  • 2-D correlation data:
    • Rings
    • Pairs
    • Deletions
    • DANCE-MaP

Installation is simple

I'm happy to help with this. Instructions are on the GitHub page.

Notebook Setup

The first code cell of a notebook should define defaults and load in modules

For high-level plotting functions, you only need to import plotmapper.

For this demonstration, I also need matplotlib.pyplot.

In [1]:
# Display plots in-line
%matplotlib inline

# import modules
import plotmapper as MaP
import matplotlib.pyplot as plt

Initializing MaP.Sample

MaP.Sample is the core object in this package. For each MaP experimental sample, it holds the following information.

  • Sample name
  • Base-pairing information (.ct)
  • Secondary Structure (.xrna, .varna, .cte, .nsd)
  • Tertiary Structure (.pdb)
    • requires PDB entry name
  • ShapeMapper Log file
  • ShapeMapper Profile
  • RingMapper data
  • PairMapper data
  • DANCE-MaP prefix:
    • Finds: reactivities, pairs, rings, allcorrs, and ct files if present.
  • SHAPE-JuMP deletions data
    • requires a reference fasta file
In [2]:
example1 = MaP.Sample(sample="example1",
                      profile = 'data/example1_rnasep_profile.txt',
                      ct = 'data/RNaseP.ct',
                      ss = 'data/RC_CRYSTAL_STRUCTURE.xrna',
                      rings = 'data/example1-rnasep.corrs',
                      pairs = 'data/example1-rnasep-pairmap.txt',
                      log = 'data/example1_shapemapper_log.txt',
                      dance_prefix = 'data/example1_rnasep',
                      deletions = 'data/example-rnasep-deletions.txt',
                      fasta = 'data/RNaseP-noSC.fasta',
                      pdb = 'data/3dhs_Correct.pdb',
                      pdb_name = '3dhs')
example2 = MaP.Sample(sample="example2",
                      profile = 'data/example2_rnasep_profile.txt',
                      ct = 'data/RNaseP.ct',
                      ss = 'data/RC_CRYSTAL_STRUCTURE.xrna',
                      rings = 'data/example2-rnasep.corrs',
                      pairs = 'data/example2-rnasep-pairmap.txt',
                      log = 'data/example2_shapemapper_log.txt',
                      dance_prefix = 'data/example2_rnasep',
                      deletions = 'data/example-rnasep-deletions.txt',
                      fasta = 'data/RNaseP-noSC.fasta',
                      pdb = 'data/3dhs_Correct.pdb',
                      pdb_name = '3dhs')
In [3]:
example3 = MaP.Sample(sample="example3",
                      profile = 'data/example3_rnasep_profile.txt',
                      ct = 'data/RNaseP.ct',
                      ss = 'data/RC_CRYSTAL_STRUCTURE.xrna',
                      rings = 'data/example3-rnasep.corrs',
                      pairs = 'data/example3-rnasep-pairmap.txt',
                      log = 'data/example3_shapemapper_log.txt',
                      dance_prefix = 'data/example3_rnasep',
                      deletions = 'data/example-rnasep-deletions.txt',
                      fasta = 'data/RNaseP-noSC.fasta',
                      pdb = 'data/3dhs_Correct.pdb',
                      pdb_name = '3dhs')
example4 = MaP.Sample(sample="example4",
                      profile = 'data/example4_rnasep_profile.txt',
                      ct = 'data/RNaseP.ct',
                      ss = 'data/RC_CRYSTAL_STRUCTURE.xrna',
                      rings = 'data/example4-rnasep.corrs',
                      pairs = 'data/example4-rnasep-pairmap.txt',
                      log = 'data/example4_shapemapper_log.txt',
                      dance_prefix = 'data/example4_rnasep',
                      deletions = 'data/example-rnasep-deletions.txt',
                      fasta = 'data/RNaseP-noSC.fasta',
                      pdb = 'data/3dhs_Correct.pdb',
                      pdb_name = '3dhs')

High-level plotting functions

  • Single sample plotting:
    • sample.make_plot(arguments)
  • Multi-sample plotting:
    • MaP.array_plot(samples, arguments)
  • Plot can be:
    • log_qc
    • shapemapper
    • skyline
    • dance_skyline
    • heatmap
    • ap
    • ss
    • 3d

ShapeMapper QC

  • make_log_qc (high-level function)
    • plot_log_MutsPerMol
    • set_log_MutsPerMol
    • make_log_MutsPerMol
    • plot_log_ReadLength
    • set_log_ReadLength
    • make_log_ReadLength
    • get_boxplot_data
    • plot_boxplot
  • array_qc
In [4]:
example2.make_log_qc();
In [5]:
MaP.array_qc([example1, example2, example3, example4]);

Linear Regressions

  • plot_regression
In [6]:
fig, ax = plt.subplots(1,2, figsize=(14,7))
example2.plot_regression(example1, ax=ax[0])
example4.plot_regression(example3, ax=ax[1], colorby="nucleotide")

Classic ShapeMapper Plots

  • make_shapemapper
    • plot_sm_profile
    • plot_sm_depth
    • plot_sm_rates
In [7]:
example2.plot_sm_profile();
In [8]:
example2.plot_sm_rates();
In [9]:
example2.plot_sm_depth();
In [10]:
example2.make_shapemapper();

Skyline Plots

  • make_skyline
  • make_dance_skyline
    • get_skyline_figsize
    • plot_skyline
    • plot_sequence
  • array_skyline
In [11]:
example2.make_skyline();
In [12]:
MaP.array_skyline([example1, example2, example3, example4]);
In [13]:
example2.make_dance_skyline();

Colorbars

The plots I'll be showing don't have colorbars yet. To get a stand-alone colorbar, use the view_colormap() function:

In [15]:
MaP.view_colormap("pairs")
MaP.view_colormap("rings")
MaP.view_colormap("deletions")
MaP.view_colormap("deletions", metric="Distance")

Heatmap and Contour Plots

  • make_heatmap
    • get_distance_matrix (This is not speedy yet for contact distances.)
    • plot_contour_distances
    • plot_heatmap_data
In [16]:
fig, ax = plt.subplots(1, 2, figsize=(14, 7))
example2.make_heatmap("deletions", "pdb", ax=ax[0])
example2.make_heatmap("deletions", "ct", ax=ax[1]);

Arc Plots

  • make_ap
    • add_arc
    • get_ap_figsize
    • set_ap
    • plot_ap_ct
    • plot_ap_ctcompare
    • plot_ap_profile
    • plot_ap_data
  • array_ap
    • make_ap
In [17]:
example2.make_ap(attribute="deletions", Percentile=0.95);
In [18]:
MaP.array_ap([example1, example2, example3, example4], attribute="rings", cdAbove=15);

Secondary Structure

  • make_ss
    • set_ss
    • plot_ss_structure
    • plot_ss_sequence
    • plot_ss_positions
    • set_3d_distances (if coloring by 3d distance)
    • plot_ss_data
  • array_ss
    • make_ss
In [19]:
example2.make_ss(attribute="rings");
In [20]:
MaP.array_ss([example1, example2, example3, example4], attribute="pairs");

3D molecule interactive plots

Controls:

  • click and drag to rotate
  • mouse scroll or right click to zoom
  • 3rd mouse button and drag to pan
In [22]:
example2.make_3d(attribute="deletions", metric="Distance", Percentile=0.99)

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Out[22]:
<py3Dmol.view at 0x18a7ac98940>
In [23]:
MaP.array_3d([example1, example2, example3, example4], attribute="rings", Statistic=15)

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Out[23]:
<py3Dmol.view at 0x18a7c27c1d0>

Review

PlotMapper and Jupyter Notebooks provides a fast and easy way to explore MaP and JuMP data sets.

  • Quality contol
  • Skylines
  • Linear Regression scatter plots
  • Arc Plots
  • Heatmaps
  • Secondary Structure
  • 3D structure
  • etc.

Still left to do:

  • Improve look and readability of some figures.
  • Add functionality that is commonly used in lab, but not by me.
  • Create a command line interface.
  • Taking requests.

What I need from the Weeks Lab

  • Testing, figuring out when my code fails.
  • Ideas for improving the look and readability of plots.
  • Plotmapper is very extensible. New functions should be added to it instead of creating new scripts.
In [ ]: